Most of the linguistic information most people communicate is in the form of speech, and most people can speak much faster than they can communicate linguistic information by any other means. Yet most people can read much faster than they can understand speech, even if the speech is recorded and artificially sped up without pitch distortion. And whereas recorded textual information can be visually scanned and searched with great ease and rapidity, searching or scanning recorded speech is painfully tedious, a discrepancy exacerbated by today's networked computer systems, which make it possible to search enormous quantities of textual data in an instant, yet are useless at interpreting voice data. Compared to speech, text is also far easier to edit, organize, and process in many other ways.
Accurate, fast, and affordable speech transcription could bridge the advantages of speech and text; however, no existing solution meets all three of these criteria. Trained human dictation typists set the standard for accuracy, but are slow and expensive. Skilled human stenographers are faster, but take a much longer time to train and master, the output of their stenotype machines is ambiguous, and they cost even more than dictation typists. Automatic speech recognizers are the most affordable, but their accuracy for normal conversational speech of most speakers in most situations is, in the current state of the art, unacceptably low for most purposes. Trained human voicewriters substitute their clearly enunciated speech as input to automatic speech recognizers, and correct the remaining errors in the output, thereby matching the accuracy of typists while retaining much of the speed of automatic speech recognition; but trained voicewriters are also even more expensive than dictation typists.
The high cost of dictation typists and voicewriters reflects the paucity of interested people with the requisite linguistic ability and hearing acuity, compounded by the long training duration it takes to acquire the vocabulary, accuracy, and speed necessary for real-time transcription. Dictation typing additionally requires excellent manual dexterity and spelling aptitude; while voicewriting additionally requires excellent oral dexterity and elocutionary aptitude, as well as the development of the necessary computer operation skills. Proficiency in voicewriting, unlike typing, also entails a mutual adaptation in which the voicewriter is trained to speak so as to be accurately understood by the speech recognizer while the speech recognizer is trained to accurately understand the voicewriter. In fact, the speech recognizer must also continue to learn new vocabulary and syntactic constructions along with the voicewriter. The obscurity of the voicewriting profession and the rarity of educational programs training the art have further limited the talent pool.